Introduction

The loan data is from Prosper, This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information. The last date updated this data set is 3/11/2014.

The outline of this project is:

1. Introduction

2. Univariate Plots Section

3. Univariate Analysis

4. Bivariate Plots Section

5. Bivariate Analysis

6. Multivariate Plots Section

7. Multivariate Analysis

8. Final Plots and Summary

9. Reflection

The structure of the dataset,

The data set needs to clean up there is many NA values.
Remove NA values

Discover the dataset, The number of columns in the data set are 81 columns

## [1] 81
##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"
##                  ListingKey ListingNumber ListingCreationDate CreditGrade
## 139 11273541569159931E84F17        569000             22:33.4            
## 180 0F1E35343868130956BD68F        544844             50:26.0            
##     Term LoanStatus      ClosedDate BorrowerAPR BorrowerRate LenderYield
## 139   36  Defaulted 20/09/2012 0:00     0.33973       0.2999      0.2899
## 180   36  Defaulted 20/08/2012 0:00     0.34731       0.3073      0.2973
##     EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 139                  0.2766         0.149          0.1276
## 180                  0.2837         0.149          0.1347
##     ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 139                       2                     E            3
## 180                       2                     E            1
##     ListingCategory..numeric. BorrowerState        Occupation
## 139                         6            KY Military Enlisted
## 180                         2            MN    Postal Service
##     EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
## 139         Employed                      126                TRUE
## 180         Employed                       87                TRUE
##     CurrentlyInGroup GroupKey DateCreditPulled CreditScoreRangeLower
## 139            FALSE          06/03/2012 11:00                   620
## 180            FALSE           16/12/2011 3:50                   660
##     CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines
## 139                   639         20/04/2001 0:00                  7
## 180                   679         11/03/1999 0:00                 16
##     OpenCreditLines TotalCreditLinespast7years OpenRevolvingAccounts
## 139               8                         30                     2
## 180              15                         33                    14
##     OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries
## 139                          25                    5              5
## 180                         343                   11             34
##     CurrentDelinquencies AmountDelinquent DelinquenciesLast7Years
## 139                    2             1890                      23
## 180                    1                0                       0
##     PublicRecordsLast10Years PublicRecordsLast12Months
## 139                        0                         0
## 180                        0                         0
##     RevolvingCreditBalance BankcardUtilization AvailableBankcardCredit
## 139                     72                0.07                     928
## 180                   4752                0.23                    8306
##     TotalTrades TradesNeverDelinquent..percentage. TradesOpenedLast6Months
## 139          27                               0.75                       1
## 180          32                               0.96                       5
##     DebtToIncomeRatio    IncomeRange IncomeVerifiable StatedMonthlyIncome
## 139              0.35 $25,000-49,999             TRUE            3750.000
## 180              0.13 $50,000-74,999             TRUE            4583.333
##                     LoanKey TotalProsperLoans TotalProsperPaymentsBilled
## 139 A6773646313973238A33299                 1                          3
## 180 44BC36372930801559159FD                 1                          2
##     OnTimeProsperPayments ProsperPaymentsLessThanOneMonthLate
## 139                     3                                   0
## 180                     1                                   1
##     ProsperPaymentsOneMonthPlusLate ProsperPrincipalBorrowed
## 139                               0                     2000
## 180                               0                     4500
##     ProsperPrincipalOutstanding ScorexChangeAtTimeOfListing
## 139                           0                         -36
## 180                           0                         -17
##     LoanCurrentDaysDelinquent LoanFirstDefaultedCycleNumber
## 139                       121                             6
## 180                       170                             8
##     LoanMonthsSinceOrigination LoanNumber LoanOriginalAmount
## 139                         24      62391               3000
## 180                         27      57647               5500
##     LoanOriginationDate LoanOriginationQuarter               MemberKey
## 139     21/03/2012 0:00                Q1 2012 87C83528199783859742DC3
## 180     20/12/2011 0:00                Q4 2011 B64B35063311601836A9F9B
##     MonthlyLoanPayment LP_CustomerPayments LP_CustomerPrincipalPayments
## 139             127.34              127.34                        23.82
## 180             235.69              707.07                       292.65
##     LP_InterestandFees LP_ServiceFees LP_CollectionFees
## 139             103.52          -5.90                 0
## 180             414.42         -13.48                 0
##     LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 139               2976.18                0.00
## 180               5207.35             5207.35
##     LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 139                          764.27             1               0
## 180                            0.00             1               0
##     InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 139                          0                           0        31
## 180                          0                           0        45

Choosen Variables.

1. BorrowerState.
2. LoanStatus.
3. Term.
4. BrrowerAPR.
5. BorroerRate.
6. IncomeRange.
7. ListingCategory.
8. Recommendations.
9. TotalProsperLoans.
10. StatedMonthlyIncome.
11. IsBorrowerHomeowner
12. MonthlyLoanPayment.
13. LoanOriginationQuarter.

Univariate Plots Section

The highest number of Borrowers in states.

The BorrowerState: The two letter abbreviation of the state of the address of the borrower at the time the Listing was created.

The highest number of Borrowers in Arizona, Texas, Georgia, Florida and Newyork.These states also have the highest population in USA.

The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, Past Due (1-15 days), Past Due (16-30 days), Past Due (31-60 days), Past Due (61-90 days), and Past Due (91-120 days)

This plot describe the loan status. There are many status in past due there are 6 categories the total of these is around 2%, the most borrowers trying to completed their own loans because that might expose them to paying more during the delay.

IncomeRange: The income range of the borrower at the time the listing was created.

The most borrowers income is between 25,000USD to 74,999USD.

Term: The length of the loan expressed in months.

There are three term options 12 months, 36 months or 60 months. The loans with 36 length is the most liked to take with 75.2% and then 60 length of months with 23.3% the percentage of borrwers can take loans in 12 months is 1.5% because it is short length period to complete loan.

ListingCategory: The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans

This plot describe the loan catigories or the purpose the borrowers get the loans for it, the most purpose loan is Dept Consolidation and then home improvement.

Recomendations: Number of recommendations the borrower had at the time the listing was created.

As we see in this plot most people comes to take a loan without any recomedations.

BorrowerAPR: The Borrower’s Annual Percentage Rate (APR) for the loan. An annual percentage rate (APR) is the annual rate charged for borrowing or earned through an investment. APR is expressed as a percentage that represents the actual yearly cost of funds over the term of a loan.

BrrowerRate: The Borrower’s interest rate for this loan. or intrest rate

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0799  0.2289  0.2925  0.2823  0.3473  0.4135
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0699  0.2005  0.2610  0.2507  0.3099  0.3600

These plots are distrbuted normally, and the BorrowerAPR higher than the BorrowerRate in general.

TotalProsperLoans: Number of Prosper loans the borrower at the time they created this listing. This value will be null if the borrower had no prior loans.

The 20.6% of borrowers already have one prior loan and they come to get another loan, 6% of borrowes have two loans, and 2% have three loans but more than 70% of borrowers haven’t the prior loans.The ratio is low for those who can take more than one loan at the same time. Especially more than three loans are almost no less than 1%.

StatedMonthlyIncome:The monthly income the borrower stated at the time the listing was created.

The StatedMonthlyIncome in first plot with long tail and I do some transformation by taking log10 and the second plot distributed normally

IsBorrowerHomeowner: A Borrower will be classified as a homowner if they have a mortgage on their credit profile or provide documentation confirming they are a homeowner.

The plot shows the percentage of the borrowers have their own home or not. As we see it isn’t highly different between them is 2.94%.

the 53.38% of borrowers have a home and 46.62% haven’t home.
>MonthlyLoanPayment: The scheduled monthly loan payment.

LoanOriginationQuarter:The quarter in which the loan was originated.

2013 is the year in which the largest number of loans was originated through on the plot and the highest four peaks, three of them were in the year 2013.

Univariate Analysis

The structure of your dataset

This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate),
current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information.

The main features of interest in the dataset

ListingCategory or the purpose of the loan: is to show the reasons a borrowers is seeking a loan, that also shows us the most cases the people takes loans for it.
BorrowerState: the prosper bank have many branches across the US and by analyzing this varaiable we found the number of borrowers in each state.
TotalProsperLoans: Through this variable which we can estimate the percentage of the person who has already received a loan will come again and get another loan.

the features I investigated.

I remove all null or N/A values from the databse, In the TotalProsperLoans there are null value for the people don’t have any prior loans but in my investigation I just want to know the percentage of the peploe they have prior loans and come to get another.
Convert ListingCategory..numeric. to factor

Bivariate Plots Section

Loan Status and Income Range
loan status and MonthlyLoanPayment
loanStatus and IsBorrowerHomeowner
IncomeRange and ListingCategory
The BorrowerRate and AvailableBankcardCredit
The StatedMonthlyIncome and MonthlyLoanPayment
The StatedMonthlyIncome and Occupation

Loan Status and Income Range

IncomeRange: The income range of the borrower at the time the listing was created. The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue.

The most borrowers are current or completed thier loans, and some of them in charged off or past Due. The borrowers with high income which +100,000 USD

and The borrowers with medium income which is between (25,000 USD and 74,999 USD) have the highest loans and I think from my point of view this a large amount of the loan with their monthly income. The relation is when the borrower has high income can take loans and completed on time but when the borrower has low-income range may can’t complete the loan on time. 

ListingCategory..numeric. and MonthlyLoanPayment

The Medical/Dental, Debt Consolidation, Green loans and Home improvement are the most expensive loans which have the highest monthly payment. The home improvement and student use are the most taken loans depends on the preivous plot in the Univariate section loans and are often loans taken for once in a lifetime.

IsBorrowerHomeowner and loanStatus

EmploymentStatus: The employment status of the borrower at the time they posted the listing.

In each status the borrowers have thier own homes more than have not home. In general, from a plot, ~ 64% of employed borrowers have their own homes. This is due to my view of the sense of independence and financial security of those who own their homes 

Income Range and ListingCategory

The loans for student use take more by low and medium income range, The borrowers with +100,000 USD They are the least receiving loans. The student use loans are most taken by borrowers their income is between 25,000 - 49,000 and 50,000 - 74,000. through the plot the Borrowers with high-income range who are above $ 100,000 they tend to take Dept Consolidation loans more than any other loan, This is due to their high income which enables them to take several loans and the ability to repay them. But the Low-income range borrowers they are taking loans little, perhaps because of their low income, which is not enough.

The BorrowerRate and AvailableBankcardCredit

AvailableBankcardCredit: The total available credit via bank card at the time the credit profile was pulled.

Through this plot I noticed that the line is diagonal and the relationship is inverse between the AvailableBankcardCredit and the interest rate (BorrowerRate) When the AvailableBankcardCredit is increase, the interest rate (BorrowerRate) decrease.

The MonthlyLoanPayment and StatedMonthlyIncome

The correlation between the statedMonthlyIncome and MonthlyLoanPayment is moderate when the incom increase the the montly loan payment increase too.

The StatedMonthlyIncome and Occupation

The highest stated monthly income is taken by Computer programmer and Engineer - Mechanical.

BorrrowerRate and Term

The intrest rate in 12 months length between 1% and 2% and the upper whisker is ~2.5%, The 36 month length slightly higher than intrest rate than 12 and 60 months

Monthly Loan Payment and years

From 2008 to 2011 the monthly payment rate for loans was volatile, but the jump was in 2012 and continued to rise. Which I believe is due to the improved living and increased needs.

ProsperRating..numeric.:The Prosper Rating assigned at the time the listing was created: 0 - N/A, 1 - HR, 2 - E, 3 - D, 4 - C, 5 - B, 6 - A, 7 - AA. Applicable for loans originated after July 2009.

Bivariate Analysis

Loan Status and Income Range
ListingCategory..numeric. and MonthlyLoanPayment
EmployementStatus and IsBorrowerHomeowner
IncomeRange and ListingCategory..numeric.
Available bankcredit card and borrowerRate
MonthlyLoanPayment and StatedMothlyIncome
StatedMothlyIncome and Occupation
MonthlyLoanPayment and terms
BorrrowerRate and Term
Monthly Loan Payment and years

The relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

The relation between BorrowerRate and AvailableBankcardCredit is intresting, I was thought when AvailableBankcardCredit high borrowers may take risks or take high-interest rate of loans, but through the plot, the opposite is true, so that when a Availablebankcreditcard is less willing to take high-interest rate loans.
StatedMonthlyIncome and Occupation: The highest stated monthly income is taken by Computer programmer and Engineer - Mechanical.
Monthly Loan Payment and years :this relation shows how the monthly paymnt of loans increases over the years, this may mean a rise in salaries and a rise in the value of loans
The strongest relationship I found is the relation between AvailableBankcardCredit and intrest rate.

Multivariate Plots Section

MonthlyLoanPayment, IsBorrowerHomeowner and years.

This plot, describe the relationship among year, IsBorrowerHomeowner and StatedMonthlyIncome. The borrowers with highly income have their own homes, But from 2013 the borrowers own their home decreased and the borrowers haven’t own their home increased.

The Employment status and Borrower’s home in years

The employed borrowers have their own houses, before 2011 the borrowers have their own homes more than haven’t owned home

The avergae of Monthly loan payment for loans over a year

The monthly loan payemnt is increased over years, when the payment increase the borrower rate (intrest rate) decreased.

The Prosper rating and intrest rate in years

The borrwer rate (intrest rate) for high rating is less than the borrower rate for low rating, Prosper Rating of A or 6 have an interest rate of 1% and the rating 3 or D have 3% intrest rate.

Multivariate Analysis

Investigate the relationships among three or more variables which is
MonthlyLoanPayment, IsBorrowerHomeowner and years.

The Employment status and Borrower’s home in years.

The avergae of Monthly loan payment for loans over a year.

The Prosper rating and intrest rate in years

Final plots

Plot 1.

In this plot, I studied the relationship between MonthlyLoanPayment and Years to describe how the monthly loan payment increased through the years.
Plot 2.

This plot, describe the relationship among year, IsBorrowerHomeowner and StatedMonthlyIncome. The borrowers with highly income have their own homes.
Plot 3.

This plot, shows the relationship between borrwer rate, Prosper rating and years. The borrwer rate (intrest rate) for high rating is less than the borrower rate for low rating, Prosper Rating of A or 6 have an interest rate of 1% and the rating 3 or D have 3% intrest rate.

Reflection

I was very interested in analyzing this dataset. The prosperLoanData is a dataset from Prosper, Prosper was founded in 2005 as the first peer-to-peer lending marketplace in the United States. Since then, Prosper has facilitated more than $14 billion in loans to more than 870,000 people. The prosperLoanData contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, borrower employment status, borrower credit history, and the latest payment information. First I looked to the dataset using (str and summary functions) to get the structure and five number summary of the variables, then I read the variables definitions and some of the variables I searched to more information to explored it, the dataset was contained missing values that need to clean it. for the first section I install needed packages, libraries and remove missing data (NA’s), there are some bugs I faced when coding like when converting between string and numeric formats, convert from numeric to factor and to extract the dates and added it as three separate variables. In Univariate section investigate 13 variables out of 81 and to know more about these variables I plot each of them by visualization plots using (ggplot and geom layers), To remove repetitive codes I create functions that make coding easy. The second section is about the relationship between variables, for example, the relation between Stated monthly income and borrowers occupation. The last section is about the relationship between more than two variables to represent how these variables are related. Before this project, I didn’t know anything about loans of banks and how it works and this makes this project a little difficult to me, I spent many hours for searching about variables and watch videos about loans it was a challenge but I interested in exploring and analyzing this dataset.